Text Document Cluster Analysis Through Visualization of 3D Projections

نویسندگان

  • Masaki Aono
  • Mei Kobayashi
چکیده

Clustering has been used as a tool for understanding the content of large text document sets. As the volume of stored data has increased, so has the need for tools to understand output from clustering algorithms. We developed a new visual interface to meet this demand. Our interface helps non-technical users understand documents and clusters in massive databases (e.g., document content, cluster sizes, distances between clusters, similarities of documents within clusters, extent of cluster overlaps) and evaluate the quality of output from different clustering algorithms. When a user inputs a keyword query describing his/her interests, our system retrieves and displays documents and clusters in three dimensions. More specifically, given a set of documents modeled as vectors in an orthogonal coordinate system and a query, our system finds three orthogonal coordinate axes that are most relevant to generate a display (or users may choose any three orthogonal axes). We conducted implementation studies to demonstrate the value of our system with an artificial data set and a de facto benchmark news article dataset from the United States NIST Text REtrieval Competitions (TREC).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Shape of Shakespeare: Visualizing Text using Implicit Surfaces

Information visualization focuses on the use of visual means for exploring non-visual information. While free-form text is a rich, common source of information, visualization of text is a challenging problem since text is inherently non-spatial. This paper explores the use of implicit surface models for visualizing text. We describe several techniques for text visualization that aid in understa...

متن کامل

3D Grand Tour for Multidimensional Data and Clusters

Grand tour is a method for viewing multidimensional data via linear projections onto a sequence of two dimensional subspaces and then moving continuously from one projection to the next. This paper extends the method to 3D grand tour where projections are made onto three dimensional subspaces. 3D cluster-guided tour is proposed where sequences of projections are determined by cluster centroids....

متن کامل

Visualization and Clustering of Document Collections using a Flock-based Swarm Intelligence Technique

Electronic availability of documents continues to increase, yet identifying documents relevant to the user remains a primary constraint in electronic document use. Visual representations of document collections can facilitate search by representing large collections of documents in a manner that is complementary to linear, text based representations. Visual representations can provide a means t...

متن کامل

Dokumentbezogenes Wissensmanagement in dynamischen Arbeitsgruppen: Text Mining, Clustering und Visualisierung

In this paper, we present the research prototype of a visualization system for document sets. Starting with an analysis of typical information needs in small to medium-sized work groups, disadvantages of the current strategies for exploring document sets are identified. Based on a review of recent work in information and document visualization we present the architecture of a document visualiza...

متن کامل

“SmartDoc” - 3D Dynamic Interactive Documents

Traditional documents are characterized by the paper medium, normally restricted to contain static items such as text, images and sometimes animations. They are presented in hierarchical chapter structures, passive for the reader and basically linear to read. Integrating existing visualization methods with innovative documentation technology can increase the quality of information in technical ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012